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January  25,  1993 

Dr.  Mark  Jacobs 
Program  Director 
AFOSR-NM 
Bolling  Airforce  Base 
Washingon,  DC  20332-6448 

Dear  Dr.  Jacobs 

Enclosed  please  find  the  final  technical  report  for  the  research  on  "Computational 
Complexity  of  Connectionist  and  Constraint  Networks"  supported  by  grant  AFOSR- 
89-0151.  I  sent  a  similar  document  in  June  as  part  of  a  new  proposal,  but  apparently  it 
was  misplaced.  We  have  several  on-going  experimental  validation  projects  to  test  the 
programs  developed  with  partial  support  from  AFOSR.  When  this  work  is  completed 
we  shall  send  you  an  update.  Our  research  had  a  strong  impact  on  a  number  of  areas, 
for  example  we  effectively  closed  the  study  of  parallel  complexity  of  local  consistency 
in  constraint  networks  and  we  developed  the  best  known  methods  for  constructing 
non-axis  parallel  decision  trees.  I  would  appreciate  any  comments  you  may  have  on 
our  research. 
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Final  Report:  1992 

Complexity  of  Connectionist  and  Constraint-Satisfaction 

Networks 

Simon  Kasif 

Department  of  Computer  Science 
The  Johns  Hopkins  University 
Baltimore,  Md  21218 
KASIF@CS.JHU.EDU 
301-338-8296 


1  General  Progress  Summary  and  Overall  Impact 

Since  the  beginning  of  the  funding  of  the  grant,  we  established  a  substantial  effort  in  the 
area  of  connectionist  optimization  algorithms,  relaxation  networks,  and  geometric  learning 
algorithms.  All  of  the  above  axe  highly  interconnected  research  projects.  We  have  achieved 
several  significant  results  that  have  increased  our  understanding  of  the  computational  ca¬ 
pabilities  and  limitations  of  connectionist  and  constraint  networks.  Out  most  significant 
contributions  thus  far  are  in  the  area  of  parallel  complexity  of  constraint  networks,  com¬ 
parative  experimentation  with  learning  algorithms  and  geometric  concept  learning.  Our 
results  In  the  area  of  parallel  constraint  networks  are  the  subject  of  several  publications  in 
first  rate  journals  and  conferences.  Oux  experimental  research  achieves  the  best  results  on 
several  well  established  benchmarks.  Most  notably  our  group  achived  the  best  results  (in 
terms  of  prediction  accuracy)  in  the  area  of  protein  folding.  The  technical  results  of  our 
research  investigations  are  summarized  in  the  following  sections. 

«:v 

2  Theoretical  Analysis  of  Relaxation  Networks 

Relaxation  networks  are  a  special  case  of  constraint  satisfaction  networks  and  have  been 
used  in  optimization,  truth  maintenance  systems,  and  computer  vision.  These  networks 
utilize  local  constraint  propagation  techniques  to  achieve  local  stability  (local  consistency 
in  propositional  constraint  networks).  One  of  the  stated  objectives  of  the  research  proposal 
was  to  increase  our  understanding  of  local  optimization  (search)  techniques  In  such  networks. 
In  (KD92]  we  provide  a  complete  characterisation  of  achieving  local  stability  (consistency) 
in  symbolic  relaxation  networks.  The  results  of  these  investigation  are  reported  in  several 
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papers  Our  results  are  summarized  in  a  collection  of  papers  [KasOO,  Kas86,  Kas89,  KD91, 
KD92,  KD91] 

3  Theoretical  Analysis  of  Discrete  Hopfield  Nets 

We  completed  our  investigation  of  Discrete  Hopfield  Nets.  Our  results  are  summarized  in 
[KBDS91,  KBDS93).  In  addition  to  obtaining  several  lower  bounds  for  the  complexity  of 
finding  local  minima  (stable  state)  in  Hopfield  networks,  we  also  studied  the  complexity  of 
finding  a  new  stable  when  the  network  is  perturbed  by  modifying  one  of  the  weights  by  a 
tiny  amount  (one  bit). 


4  Parallel  Problem  Solving  and  State  Space  Search 

Our  group  at  Johns  Hopkins  recently  obtained  a  substantial  result  on  very  large  state-space 
search.  The  first  part  of  this  work  attacked  the  problem  of  6-piece  chess  end-game  analy¬ 
sis,  a  long-time  outstanding  problem  in  computational  chess  and  AI.  The  chess  end-game 
analysis  was  done  by  Lewis  Stiller,  a  graduate  student  in  our  AI  group,  who  implemented 
an  extremely  large  search-space  exploration  on  the  Connection  Machine.  His  program  has 
found  6-piece  chess  positions  that  require  more  than  240  moves  to  achieve  a  forced  win.  The 
methodology  is  also  applicable  to  other  search  problems  in  symmetry  groups.  The  results 
are  published  in  (Sti9ia,  StiOlb,  Sti92a,  Sti92b).  This  work  is  considered  a  fundamental 
breakthrough  in  the  computational  chess  community,  and  has  received  wide  publicity  in  the 
popular  press  (e.p.,  Scientific  American,  London  Times,  Washington  Post). 

As  a  side  project  we  have  discovered  an  efficient  parallel  algorithm  to  compute  string 
statistics  for  molecular-dynamics  simulations[Sti92b).  The  algorithm^  being  used  currently 
on  the  CM2  in  Los  Alamos. 

Our  research  program  in  parallel  AI  programming  also  had  an  important  educational 
side-effect.  The  PI  developed  a  regular  programming  workshop  on  parallel  programming  the 
Connection  Machine.  Graduate  and  undergraduate  students  implemented  many  diverse  ap¬ 
plications  on  the  CM  such  as  semantic  networks,  constraint  networks,  path  planning  with 
obstacles,  geometric  algorithms,  a  cellular  automata  model  of  the  heart,  psychophysical 
models  and  more.  Subsequently,  many  of  these  students  spent  summers  at  National  Labo¬ 
ratories  such  as  NRL,  SRC,  and  Los  Alamos  working  on  important  applications. 


5  Desing  and  Analysis  of  Algorithms  for  Machine  Learning 

With  partial  funding  from  the  grant  w«  started  a  set  of  new  investigations  of  machine 
learning  algorithms.  The  results  of  these  studies  are  documented  below. 

5.1  Comparative  Studies  of  Learning  Methods 

We  are  conducting  a  comprehensive  set  of  experiments  that  compare  (on  a  a  range  of  real 
world  applications)  the  performance  of  backpropagation  algorithms  to  nearest  neighbor 
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methods  learning  and  other  methods.  The  experiments  are  performed  by  David  Heath 
(funded  by  the  project)  and  Scott  Cost  and  are  co-supervised  by  Steven  Salzberg  and  the 
PI.  Salzberg  studied  the  comparative  effectiveness  of  the  backpropagation  algorithm  and 
an  instance  based  learning  algorithms  on  the  problem  of  predicting  protein  folding.  The 
results  are  documented  in  [CS90b].  A  comprehensive  comparative  study  of  conncctionist 
learning  and  instance  based  methods  has  been  completed  and  is  a  subject  of  a  of  a  journal 
paper  [CS90aj. 

5.2  Experimental  Analysis  of  Backpropagation 

We  completed  experimentation  with  two  heuristics  approaches  to  improve  the  efficiency 
and  accuracy  of  backpropagation  in  conneetionist  networks.  The  first  approach  is  based 
on  a  scaling  (multiple  resolution)  approach  to  learning.  Instead  of  teaching  the  network 
the  exact  concept,  we  train  the  network  on  a  series  of  approximations  (refinements)  of  the 
concept.  The  first  approximation  is  very  crude,  and  therefore  we  allow  a  very  large  error. 
As  the  approximations  become  more  accurate,  we  require  increasing  accuracy  from  the 
network.  I'he  preliminary  experiments  using  this  strategy  were  relatively  disappointing  in 
terms  of  improving  performance  fox  various  classes  of  problems.  While  we  found  applications 
where  the  method  improves  performance  (measured  as  the  number  of  iterations  to  learn 
the  concept  class)  in  a  majority  of  cases  the  new  algorithm  was  comparable  in  speed  and 
accuracy  to  the  old  one.  Our  findings  which  will  appear  in  a  forthcoming  report. 


5.3  Geometric  Concept  Learning 

We  (as  well  as  many  others)  observed  that  there  are  fundamental  links  between  geomet¬ 
ric  partitioning  algorithms  and  machine  learning.  We  used  combinatorial  techniques  and 
computational  geometry  to  study  basic  properties  of  learning  algorithms.  This  investiga¬ 
tion  is  the  main  topic  of  a  Ph.D.  thesis  by  David  Heath.  David  Heath’s  research  has  been 
supervised  by  Kasif  and  was  fully  funded  by  AFOSR.  Preliminary  results  from  his  thesis 
have  been  published  in  [HKK+91,  SHDK91,  HcaSl,  HKS92]  Below  we  briefly  describe  these 
topics. 


5.4  Limited  Memory  Learning 

The  majority  of  learning  algorithms  (including  conneetionist  algorithms)  use  limited  mem¬ 
ory  (a  fixed  size  network)  during  learning.  We  addressed  the  problem  of  the  complexity  of 
learning  when  the  algorithm  is  allowed  to  store  only  the  generalization.  It  cannot  store  the 
entire  set  of  examples  and  process  them  off-line.  One  way  to  abstract  this  restriction  is  by 
enforcing  a  limited  memory  requirement.  That  is,  the  algorithm  is  restricted  to  some  small 
number  of  memory  locations.  We  recently  completed  a  paper  where  we  establish  funda¬ 
mental  bounds  on  the  number  of  steps  necessary  and  sufficient  to  learn  concepts  when  the 
learning  module  is  not  allowed  to  store  all  examples.  The  paper  is  the  first  of  its  kind  to 
establish  a  fundamental  trade  off  in  the  number  of  steps  required  to  learn  a  concept  and  the 
size  of  the  generalization  used  by  the  algorithm.  Our  results  are  summarized  in  [HKK+9l) 


3 


JAN-23-93  MON  17:26  JHU-HOMEUOOD  RESRCH  «=i  D  n  N  P  ,  e  .> 


5.5  Learning  with  Helpful  Teacher 

Research  in  theoretical  machine  learning  focuses  on  the  complexity  of  learning  a  concept 
class  independent  of  particular  learning  architectures.  In  general,  this  research  does  not 
address  the  following  important  questions.  What  are  the  strengths  and  weaknesses  of  a 
learning  algorithm  when  applied  to  a  particular  learning  problem?  How  do  particular 
algorithms  compare  with  each  other?  What  is  the  right  way  to  teach  a  given,  concept  to  a 
particular  learning  machine?  Docs  providing  additional  examples  always  help  learn  a  given 
concept? 

We  address  some  of  these  questions  in  the  context  of  nearest  neighbour  and  other  learning 
methods.  A  theoretical  model  is  introduced  where  the  teacher  knows  the  learning  algorithm 
and  chooses  examples  in  the  best  way  possible.  We  call  this  the  Helpful  Teacher  model,  and 
note  that  it  often  requires  a  very  small  number  of  examples  to  learn  a  concept.  We  prove 
some  lower  and  upper  bounds  for  a  variety  of  geometric  concept  classes,  This  is  joint  work 
with  Art  Delcher,  David  Heath,  and  Steven  Salzbcrg  Our  initial  findings  on  this  topic  are 
summarized  in  (SHDK91J  where  we  also  discuss  the  implications  of  our  results  for  current 
experimental  research. 

We  also  addressed  the  problem  of  the  complexity  of  being  a  Helfpul  teacher.  We  show 
that  the  complexity  of  presenting  the  best  possible  set  of  examples  for  geometric  problems 
is  NP*hard.  In  the  process  of  solving  this  problem  we  produced  several  new  computational 
geometry  results  [HeaSl,  D.92]. 

5.6  Learning  Oblique  Decision  Trees 

We  recently  began,  investigating  the  following  natural  generalization  of  pcrceptrons.  Given 
two  sets  of  points  (e.g.,  blue/red)  in  n-dimensions  and  an  integer  K,  find  a  hyper  plane 
that  partitions  the  set  into  two  categories  such  that  at  most  K  points  are  mis  classified. 
We  proved  this  by  transformation  from  the  dual  problem,  namely  given  a  set  of  N  linear 
inequalities,  is  there  a  feasible  solutions  that  excludes  at  most  X  inequalities.  This  problem 
is  NP-complete  by  an  easy  transformation  from  3SAT. 

This  result  should  be  contrasted  with  the  existing  complexity  results  on  the  scalability 
of  learning  in  networks.  The  majority  of  the  NP-completeness  results  such  Judd's  and  Blum 
and  Rivest’s  that  address  the  representation  issue.  That  is,  find  a  boolean  function  (thresh¬ 
old  circuit)  of  a  certain  architecture  which  is  consistent  with  the  training  set.  But  when 
the  network  does  backpropagation,  it  usually  stops  when  a  certain  accuracy  is  achieved, 
or  alternatively  in  the  case  of  boolean  functions,  the  network  is  partially  consistent  with 
training  set.  The  question  above  dircetly  addressed  the  problem  of  learning  a  functions 
with  a  given  accuracy.  Our  complexity  result  shows  that  this  problem  is  difficult  even  for  a 
single  hyperplane. 

We  got  interested  in  learning  with  given  accuracy  when  we  were  thinking  about  geometric 
versions  of  decision  trees,  where  the  decision  surfaces  arc  allowed  to  be  linear  combinations 
of  features.  Such  decision  trees  generate  space  partitioning  by  convex  polyhedra.  The 
majority  of  practical  work  on  decision  trees  seem  to  address  axis-parallel  decision  trees 
(e.g.,  ED3).  We  recently  developed  a  new  algorithm  for  decision  tree  induction  based  on 
the  approximate  constraint  solving  paradigm.  We  developed  a  general  constraint  solving 
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technique  that  ellows  us  to  solve  approximate  linear  programming  problems.  A  simple 
instance  of  this  class  of  problems  is  finding  a  solution  to  &  set  of  linear  inequalities  such  that 
the  largest  number  of  such  inequalities  is  satisfied.  Our  algorithm  allows  us  to  synthesize 
the  smallest  known  decision  trees  for  several  applications  such  as  breast  cancer  diagnosis 
and  classification  of  astronomical  data  while  preserving  the  prediction  accuracy  of  previous 
methods.  Our  results  are  described  in  (HKS92). 

6  Summary 

To  summarize,  we  have  developed  a  strong  research  program  in  several  areas.  Our  main 
contributions  thus  far  are  in  the  area  of  parallel  complexity  of  constraint  networks,  and 
basic  properties  of  geometric  partitioning  algorithms  with  applications  to  learning. 

Our  results  in  the  area  of  parallel  constraint  networks  are  the  subject  of  several  publi* 
cations  in  first  rate  journals  and  conferences.  Our  group’s  experimental  research  achieves 
the  best  results  on  several  well  established  benchmarks.  Most  notably  we  aehived  the  best 
performance  (in  terms  of  prediction  accuracy)  in  the  area  of  protein  folding,  improving  on 
the  back  propagation  algorithm.  We  studied  several  fundamental  geometric  partitioning 
problems  that  have  direct  implications  on  lower  and  upper  bounds  on  the  number  of  ex¬ 
amples  needed  to  learn  geometric  concepts.  We  introduced  a  new  model  that  allows  us 
to  derive  lower  bounds  on  the  number  of  examples  required  for  learning  geometric  con¬ 
cepts.  The  model  also  allows  us  to  study  the  computational  complexity  of  teaching  rather 
than  learning.  We  also  derived  the  simplest  known  complexity  result  for  the  scalability  of 
learning  in  networks,  namely  approximate  learning  of  a  perceptron.  We  developed  a  set 
of  programs  that  construct  non- axis-parallel  decision  trees  which  are  significantly  smaller 
than  previously  constructed  trees  for  the  same  domains. 
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